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■MEMORANDUM for: Deputy 


FROM 

SUBJECT 


Deputy Director of lata Processing 

Prnonsal for a Centralized Community 
Bibliographic and Document Retrieval System 
Operated by CIA 


/ 




V 


REFERENCE s OCR memo, same subject, dated 19 October 1978 

1. Attached is a copy of a Proposal- -^^he iLeUl- 

Dlrector of Central Reference, ^t^» ^ an(J doouiaent 

re«!e5a“stL bIs^ Sn AEGIS. The proposal would demon- 
strate to ^ ^ormatlon^andlin, »“ a fL„y 

believe ^ participate in^OIHS^nd ^ther^efforts to & 
statlstica^trend^report showing ^nong other things the 

number of terminal queries handled by OCR s central 
Libraries “vision in Fiscal 1978. Accompany ingthe 
statistical report is a copy of a note from 1 D/OCR to 
D/ODP explaining a few of the entries o P 

„ ^rsysg'ss ssF" 

StSaies W ?efa“ir g e cos°t lMe°ne?ded for implementation 
are reasonable and defensable. 

» t have been asked to put together the IHC presen- 
. I do so I would appreciate receiving your 

tation# Before I , , and on its time and cost 

comments on the proposal ® views on a question 


UNCLASSIFIED UPON REMOVAL 

Approved For Release S&'S^ST^i^^kSSpBSTOOSISROOO 


25X1 





i 


25Xh 


Approved For Release 



written or telephonic requests with electrical (voice or 
narrative) two way communications between the requestor and 
the intermediary (the OCR area reference analyst) , permitting 
the results of the query to be transmitted to the requestor 
electrically. 

4. I will be sharing the OCR memorandum with the 
Office of Communications and eliciting comments from them 
about the communications implications of such a proposal. 


5. It would be helpful to me if I could have your 
comments by 17 November. 


Atts: a/s 


Distribution : 

1 - ea. Addressee 

2 - O/D/ODP 


O/D/ODP 


paj/3 November 1978 
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10 OCT 1978 


MEMORANDUM FOR: CIA Member, Intelligence Information 
Handling Committee 

FROM : H. C. Eisenbeiss 

Director of Central Reference 

SUBJECT : Proposal for a Centralized Community Bibliographic 

and Document Retrieval System Operated by CIA 


1. This memorandum discusses the advantages of adapting CIA's 
RECQNl/retrieval system for intelligence documents to serve as the basis 
for a centralized bibliographic and document retrieval system to serve 
all NFIBl/agencies . The memorandum also addresses how such a system 
could be configured, what services could be provided, how long it would 
take to implement the system, some tentative estimates as to the possible 
costs involved and various methods of funding its development and operation. 
The proposal at this stage is purposefully conceptual and brief, and the 
cost estimates are extremely conjectural. If you and the other IHC 
members feel the idea is worth further exploration, additional work by 

an interagency task force will be required to flesh out exa'ctly how such 
a system might be brought to reality. 

2. The proposed system would be composed of two rather distinct 
subsystems, namely: a) a bibliographic retrieval subsystem wherein 
document citations dealing with specific search criteria would be 
provided to the intelligence analyst, and b) a document retrieval 
subsystem which would provide the analyst with copies of the relevant 
document images themselves in either soft copy, paper or microfiche. 

The system's total cost to the government would be mitigated by the 
savings it would achieve by making unnecessary certain duplicate and 
redundant systems in the Intelligence Community. 


1/ RECON is the on-line version of what is generally referred to as 
the AEGIS system. AEGIS operates primarily in the batch mode but RECON 
uses an inverted file technique enabling faster access to the data. 

2/ Defined as CIA, State/INR, DIA, the Military Service's Intelligence 
Branches, NSA, Treasury Department, DOE and FBI. 
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3. Various possible options and means of configuring this system 

exist, including arrangements involving centralized file creation of 
both bibliographic and microfilm records combined with decentralized 
retrieval service (wherein copies of magnetic tapes and the filmed 
documents would be transmitted on a regular basis to individual agencies 
for their own use). A number of these options are explored in this 
paper but not the "centralized/decentralized" approach. Such an arrange- 
ment, though technically feasible, is believed to present too many 
disadvantages in its implementation and operation to warrant further 
examination. , 

r 

Why Use RECON?_ 

4. The RECON subject file, from which the proposed Community data 

base would be derived, has several advantages over other computer-based 
document indexing systems currently used by NFIB agencies. Initiated in 
1968, the RECON file is the largest and most comprehensive subject index 
to intelligence reports in the Community. As of September 1978 the file 
contained 3,000,000 index records. RECON offers access to virtually all 
substantive intelligence documents originated (given general distribution) r 

by the CIA, DoD, DIA, Air Force, Army, Navy, NSA, State, and NPIC, and 

some documents from other government agencies of the United States \_ | STATINTL 

| The data base contains both raw and finished 
intelligence reports, includes both collateral intell igence' and Sensitive 
Conipartmented Information (SCI), and the area coverage is world-wide. 

Subjects indexed include government, politics, society, culture, science 

and technology, transportation, communications, business, commerce, 

industry, finance, commodities (both strategic and non-strategic) , _ 

products (civilian and military), resources (including labor and military r 

manpower), and the armed forces. In brief, no area of interest to ’ 

intelligence is overlooked. Open literature, non-CIA cables, and | | STATSPEC 

reporting are included on a selective basis. 

5. The full RECON data base is stored in machine-readable form 
and is searchable by computer via any one or a combination of the 
elements used to describe each document. These include the bibliographic 
description (title, issuing agency, post or origin, date, report number, 
security classification and dissemination restrictions); area codes 
(China and the Soviet Union are subdivided to the province and oblast 
level, respectively); specific place names where appropriate; subject 
codes; and keywords. The 320 subject codes are standardized broad 
subdivisions, more than one of which can be assigned to any single 
document by the indexers in CIA's Office of Central Reference (OCR). 

The keywords are nori-standardized terms added by the indexer based on * 
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review of the title and document text; these individual keywords supplement 
the broader subject codes and thus refine the retrievability of each 
individual document. The flexibility of such an indexing system allows 
it to easily accommodate new subject indexing requirements. 

6. RECON has an historical depth of 10 years and is the most up- 
to-date general purpose subject index to intelligence documents available. 
Approximately 85-90 percent of incoming documents are available for 
computer search of the index records within eight days after receipt, 
and by July 1979 this figure will be reduced to three days. Portions of 
the RECON data base are now available to the Community via COINS, and 
the total data base itself has been queried on a limited basis by OCR 
analysts for all NFIB agencies continually since its development. When 
CIA's earlier bibliographic retrieval system, knownas "Intellofax," was 
in operation, then non-CIA use of the CIA index to intelligence reports 
was about 45 percent of total queries. With the initiation of the 
AEGIS/RECON system in 1967-68, however, CIA management placed severe 
limits on other agency access to these bibliographic records because of 
substantial reductions imposed on CIA resources. Even under this 
restriction, however, non-CIA use of the data base has crept upward, and 
during the first half of CV 1978 the entire data base was queried oyer 
800 times by non-CIA NFIB agencies (approximately 267o of total queries 
during this period). During the same period, the finished intelligence 
portion of the RECON data base, which is part of the COINS 'system, was 
queried via COINS by non-CIA NFIB agencies over 1 ,200 times. 

The Bibliographic Subsystem— Alte rnative Configurations 

And Cost Estimates 


Option Onej Retrieval Through Intermodioades 

7. The least costly approach of providing RECON bibliographic 
records to the Community would simply entail offering increased service 
from the system in its present configuration to other NFIB members. 

Under this arrangement, a non-CIA analyst presents his research request 
in writing or over the phone to an OCR area reference analyst, who 
queries the RECON data base and then mails the printed listing of 
records to the original requester. 

8. The primary disadvantages of this system are the delays 
involved in having to mail the request and the document listing. The 
existence of an intermediary (the OCR area reference analyst) between 
the end user of the data and the data base itself can also be a dis- 
advantage, but not without some positive aspects. Among the disadvantages, 
the requester may have no way of knowing how large or small a document 
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listing he will be getting until he receives it from the area reference 
analyst. Any revision of his query to make his request either more 
inclusive, more selective, or otherwise more appropriate for retrieving 
precisely what he needs can only be made after the query has been run 
and the complete document listing is received through the mail. On the 
positive side, the intermediary reference analystusually has a better 
knowledge than the requester of the subject indexing codes and keywords 
(including how they have been used), and he can often translate the 
requester's needs into a more effectively worded query than if the 
requester is left to his own devices. * 

9. The following costs are foreseen if the current system of 
Community access to RECON is simply expanded. About 8-10 more document 
indexers and dissemination personnel would be needed to process the 
additional material expected to be added to the data base, in addition 
to indexing certain categories of documents in greater depth to satisfy 
the anticipated specific needs of various agencies. An additional 
typist would be necessary for the added input to the data base. Two 
additional camera operators would be needed in OCR's Microform Processing 
Branch to handle the increased volume of incoming documents to be filmed. 
Fifteen more area reference analysts would be needed to handle the added 
volume of requests!/ At least two more clerks would be needed to 
address and package listings for mailing and to prepare document and 
courier receipts. An additional direct access storage unit would have 

to be leased in order to store the greater number of document citations 
in the data base. No additional computer equipment, software, personnel 
or floor space would be required. These operating expenses would probably 
total more than $500,000 per year. (See the attached table for a summary 
of all cost estimates.) 

Option Tuoj. Dir eat On-Line Retrieval 

10. If CIA's RECON data base is to be made available to all other 
NFIB agencies, there is a preferred alternative to merely expanding the 
operation described above. This would be to provide on-line access to 
the data base (stored at CIA Headquarters) via remote visual display 
terminals (VDTs) in other agencies. Such access could be made available 


1/ it is extremely difficult to accurately estimate the number of 
index search requests that would be levied on CIA if RECON were made 
available to the Community without restriction. However, for the 
purposes of this memo, it is assumed that the current level of requests 
would increase five-fold. (This figure is largely a guess, based partly 
on OCR's experience with non-CIA requesters before controls were imposed 
on their use of the RECON data base.) 
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on a 24-hour/day basis if necessary. Bibliographic references displayed 
on these remote VDTs could be printed immediately on medium-speed (300 
lines/minute) printers co-located at each VDT. In this connection it 
should be pointed out that since the fall of 1973 a variety of intelligence 
analysts in CIA have been successfully querying the entire RECON data 
base directly via the SAFE Interim Systeml/ remote VDTs without OCR 
intervention. These analysts were formally trained to search the data 
base and are provided with guidance when necessary. 

11. The principal advantages of this arrangement include the 
significantly faster availability of the document citations to the 
analyst, plus the capability for the analyst to work directly with the 
data base. The latter feature would enable the analyst to determine if 
the subject codes and keywords he had chosen were producing references 
to the kinds of documents he needed; he could also see how large his 
document listing would be and modify his query parameters if necessary. 

All this could be done before ordering a printout from the system. For 
standing requests for index searches the capability to query the data 
base via the batch mode would be retained, rather than requiring the 
analyst to repeatedly compose his query at a terminal. 

12. If the on-line arrangement outlined is adopted, existing data 
communications systems such as the COINS network should be able to 
handle the transmission of the RECON bibliographic records 'from CIA 
Headquarters to requester terminals located at other NFIB agencies. 

Assuming that the COINS network were used, the following tasks would 
have to be undertaken. A dedicated host computer would have to be 
installed and the RECON system software would have to be modified to 
make the computer program "reentrant," an arrangement enabling the 
central processing unit to handle up to 50 on-line requesters simul- 
taneously. This would entail a one-time payment to a contractor, and 
would require approximately three man-years of his work and one calendar- 
year of time. An extra programmer and technician would each be needed 

in OCR's computer support unit to work with the contractor during the 
software modification and later to maintain this software and troubleshoot 
the system's operation. 

13. In addition to making the host computer operational for 
RECON, a number of other tasks would be required. The software inter- 
faces connecting the computer, the message processor , and the COINS 
network would have to be developed. Certain additional software and 
hardware changes would be needed to adapt the RECON system to accommodate 


1 / This is the precursor of the ultimate SAFE system, designed to 
assist in all aspects of intelligence production. 

-5- 
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an increased number of users. Also, some combination of software 
modifications and human intervention may be required to resolve security 
release problems. If all the necessary equipment were bought outright, 
the investment expenses are estimated to be about $2,700,000. 

14. If the necessary equipment were rented instead of purchased 
outright, its cost is estimated at about $780,000 per year, including 
maintenance. 

15. The annual operating costs would include an additional computer 
programmer, a computer technician, and three more computer operators, 
plus higher equipment maintenance costs. The total of these operating 
costs is estimated to be about $175,000 per year. 

16. In addition to the extra personnel --including indexers and 
microphotographers--al ready mentioned, a centralized staff of about 
three or four people ($60-80, 000/year) would probably be necessary to 
coordinate new indexing requirements from participating agencies; to 
train personnel to use the system and to provide on-going guidance once 
the system enters operation; and to handle trouble calls and transmit 
questions to appropriate operating personnel. 

The Document Retrieval Subsystem — Alternative Configurations 

And Cost Estimates 


17. If a centralized document retrieval service in CIA is envisaged 
to supplement the centralized bibliographic retrieval service, then the 
CIA's current document retrieval system would have to be significantly 
enhanced to accommodate the increased work load. The system as it now 
operates is capable only of handling the present request load. For this 
reason future requests for copies of documents, whether generated by 
either of the bibliographic retrieval options discussed above, would 
have to await implementation of the CIA's Automated Document Storage and 
Retrieval (ADSTAR) system, scheduled to enter operation within CIA in 
November 1979. Like the bibliographic retrieval system discussed above, 
the ADSTAR document retrieval system could operate in either a batch or 
on-line mode. In either mode, ADSTAR employs digitized images in its 
document retrieval and display processing, and present plans call for 
transmitting such document images directly to CIA user analysts at their 
remote locations over an upgraded communications network implemented as 
part of the SAFE system. 

Option One • Batch Mode 

18. Under this configuration the ADSTAR system within CIA would 
produce copies of documents after receiving a request for them either 
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via a document listing sent through the mail (Bibliographic Retrieval 
Option 1, discussed in paragraph 7) or via a command entered by the 
requester on his remote terminal in another NFIB agency (Bibliographic 
Retrieval Option 2). These documents would then be mailed to the requester. 

19. The costsi/of such a document retrieval system can be separated 
into investment and operating expenses. An ADSTAR system augmented to 
provide Community-wide service would require approximately eight more 
storage modules to accomodate the assumed 25 percent increase in the 
number of documents five years old or less thatare to be stored in that 
portion of the system designed to provide immediate. retrieval . (These 
need not be added all at once; two per year could probably take care of 
the expected annual ADSTAR file growth.) Larger central processing 
units would be needed to accommodate the greater number of index records 
and associated support files. For the same reasons more disk packs and 
disk drives would be needed, the buffer capacity would have to be 
doubled and at least one other high-speed printer would have to be 
acquired. If this new centralized document servic e were to result in a 
demand for more documents in microfiche, the microfiche output capability 
would have to be greatly enhanced. Finally, software modifications to 
the ADSTAR system would be needed. These would all be. one-time investment 
costs, and, while extremely conjectural, would probably total over 
$ 1 , 000 , 000 . 

20. The increased operating costs anticipated for an expanded 
ADSTAR system would include two additional personnel to intervene in the 
ADSTAR process to resolve document release questions. Two extra clericals 
would be needed for packaging, mailing, and preparing document and 
courier receipts for batch requests for documents. Maintaining the 
various expanded support files (e.g., MIS and Security Access) would 
require another full-time employee. For preventive maintenance of the 
additional equipment, the maintenance contract would cost more. These 
operating costs would probably come to about $150,000 per year. 

Option Twojj Pfrggf On-Line Retrieval 

21. In its most sophisticated configuration, remote ADSTAR terminals 
located throughout the Intelligence Community could allow non-CIA 


1 / For the purposes of estimating costs, it is assumed that the 

number of documents processed into the data base will increase by 25^ 
above the present level. This figure is based on the current volume of 
cables and other material (consisting primarily of finished intelligence 
produced by various unified and specified military commands) received by 
CIA that is now being processed on a selective basis only into the REC0N 
data base. 
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requesters to query the CIA's central ADSTAR library and display the 
text and print hard copies of whichever documents the NFIB analyst 
selected from his RECON listing. 

22. Such an on-line document retrieval system, however, could not 
be developed on the basis of existing data communications systems, such 
as the COINS network. This is because the bandwidth capacity to handle 
ADSTAR document image transmissions, which consist of approximately four 
million bytes per page image, is not available in existing Community 
networks. The data transmission problem could be eased somewhat by 
using advanced data compression techniques, but even such a compressed 
data transmission would require an estimated one million bytes per page 
image. 


23. Development of such an on-line document retrieval system, 
compared to the ADSTAR batch mode, would require additional outlays for 
a central processing unit of greater capacity, more software, and (most 
importantly) the communications system hardware; the latter would include 
the communication lines themselves as well as the interface equipment, 
encryptors, decryptors, and remote access and display stations. Also, 
as with the on-line bibliographic retrieval system, appropriate measures 
would have to be taken to handle security release problems before this 
system is implemented. We cannot estimate the total of these additional 
costs without tasking communications specialists to undertake a study of 
the problem, but undoubtedly the costs would be substantial. 

Funding 


24. Funding could be accomplished in at least four different ways, 
each of which has its advantages and disadvantages. One possible method 
involves user agencies supplying personnel to CIA according to a ratio 
proportionate to the additional • input burdens each agency would impose 
on the RECON system plus the use each agency made of the system. This 
method has been used between CIA and NSA for reference support under 
Project Mi 11 stream. Its applicability when a number of agencies are 
concerned, however, is questionable. There is the problem of allocation 
of manpower compensation from individual agencies whose costs to the 
system are fractions of manyears. There are also the problems attendant 
with periodic replacement of personnel and with the loss of control by 
CIA in applying its own personnel selection procedures and standards to 
all of the people working in the CIA. 

25. A second alternative would be to have user agencies transfer 
funds to the CIA to pay for their portion of the input and use made of 
the RECON/ADSTAR system. This would be similar to an arrangement during 
the 1950's and early 1960's between the State Department and the CIA, 
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whereby the latter transferred funds to the State Department to pay for 
the CIA's use of State Department biographic files. This approach is 
easier to arrange and manage than the transfer of personnel, but is 
complicated by the situation in which a number of agencies must defend a 
portion of their budgets that are allocated to a program run by another 
agency. Furthermore, this alternative does not address the question of 
personnel, so a situation could arise in which the CIA had enough money, 
but had not been authorized enough additional slots for the people 
needed to operate the system. 

26. A third way would be to have those developing and operating 
costs of the system that are associated with Community service (including 

the additional positions required) made part of the budget of the Intelligence 
Information Handling Committee (IHC) and to charge the IHC with defending 
this portion of its budget each year before Congress. A peculiarity 
associated with this arrangement would be that the investment and operating 
funds for an essentially integrated system would have to be split between 
two budgetary sources, and potential complications could develop if 
differing budgetary priorities ever arose between the IHC and the CIA. 

27. The fourth possible method would be to increase CIA/OCR's 

budget to allow it to finance the development and operation of the 
system itself. Such a proposal was made by OCR as an "enhanced" option 
in its FY 1980 program call, but it was rejected. If adopted, however, 
it would have the advantage of administrative simplicity and would avoid 

any complications arising from splitting the source of funds for developing 

and operating 'the system among different organizations. 

Time for In^tementati^ 

28. Any planned expansion of the CIA's bibliographic and document 
retrieval system would require a thorough and detailed study of at least 
six months' duration, plus time to hire whatever additional personnel 
the study will have called for. 

29. The maximum Community-wide service that could then be implemented 
would be batch bibliographic retrieval via OCR area reference analysts, 
with document retrieval accomplished through each NFIB agency's own 
document library. This arrangement could be set up as soon as additional 
service personnel were hired, possibly as early as six months after 
completion of the initial six-month preliminary study, assuming that the 
requisite floor space could be acquired. 


-9- 
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30. The more advanced approach of providing on-line bibliographic 
access would probably require at least two years after completion of the 
initial six-month study. During this period, software modifications 
would have to be accomplished, additional equipment would have to be 
acquired and installed, and non-CIA agencies would have to program their 
budgets for the communications equipment and remote terminals they must 
fund. 


31. Centralized document retrieval would be impossible for the CIA 
in either a batch or on-line configuration until after the ADSTAR system 
had been implemented and operationally tested for at least six months. 
This would make ADSTAR available for Community-wide use no earlier than 
June 1980, and then only for batch retrieval. 

32. An on-line ADSTAR system that serviced non-CIA agencies via 
remote work stations would take at least two years for programming user- 
agency budgets, and acquiring and installing the necessary additional 
equipment. 


Unex 2 loj£^d jssues 


33. The foregoing examines .some basic considerations regarding the 
establishment of a centralized bibliographic and document .retrieval 
system. If the IHC feels this proposal is worth pursuing, then the 
questions of user requirements, system architecture, and precise invest- 
ment and operating costs would all have to be thoroughly researched. In 
addition, other unresolved issues relating to these and other aspects of 
the system would have to be studied in detail. These include security 
arrangements, floor space for machines and people, and the cost and 
funding of communication lines, printers, remote terminals and other 
equipment at participating agencies. Finally, we would want to examine 
what savings such a system would provide within the Community, either by 
reducing on-going activities or planned new ventures necessitating 
substantial expenditures in labor and hardware for systems now in the 
design stage. ■ 


J H. C. Eisenbeiss 
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Attachment 
Page 1 of 2 


One-time Costs Annual Costs 

$ 24,000 

500,000 

$1,000,000 150,000 

$ 674,000 

$ 1 , 000,000 45 = 200 , 000 * 

$ 874,000 

annual share of initial one-time" costs, assuming a system life of five years. 
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Option 2 - Direct On-Line Retrieval 



Purchase 


Lease 



One-time Costs 

Annual Costs 

One-time Costs 

Annual Costs 

Hardware 

$2,700,000 



$ 780.000 

Maintenance 

* 

o 

o 

o 

o 



Software Modification 

500,000 


$ 500,000 


Staffing 


755,000 


755,000 

ADSTAR Costs 

1,000,000 

150,000 

1 ,000,000 

150,000 

TOTAL 

$4,200,000 4- 5 = 

$ 975,000 
840,000* 
$1,815,000 

$1,500,000 f 5 = 

$1,685,000 

300,000* 

$1,985,000 


Pro rata annual share of initial one-time costs, assuming a system life of five years. 
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MEMORANDUM FOR: Mr. May 

t>0 - , 

Page 2, Terminal Queries, is the data that is relevant 
to the COINS, etc kinds of questions. RSM, Rapid Search 
Machine, Delta Data is the link from the document library, 
into your data bases; IDF is the NPIC file target; NYTIB is 
the New York Times Information Bank; MEDLINE is the National 
Library of Medicine Clinical and DiagnosticaT file; COINS is 
you know what; DDC is the Defense Documentation Center; 

SOLIS is the NSA file; NTIS is Commerce's National Technical 
Information Service; and Ready Reference is the Library 
Interim SAFE index to clippings. 


D/OCR 
1 Nov 78 


Date 


FORM i fti USE P RE V I OUS 

5-75 IUI editions 
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